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Abstract 

We define a new model of quantum learning that we call Predictive Quantum (PQ ). 
This is a quantum analogue of PAC , where during the testing phase the student is only 
required to answer a polynomial number of testing queries. 

We demonstrate a relational concept class that is efficiently learnahle in PQ, while in 
any "reasonable" classical model exponential amount of training data would be required. 
This is the first unconditional separation between quantum and classical learning. 

We show that our separation is the best possible in several ways; in particular, there 
is no analogous result for a functional class, as well as for several weaker versions of 
quantum learning. 

In order to demonstrate tightness of our separation we consider a special case of 
one-way communication that we call single-input mode, where Bob receives no input. 
Somewhat surprisingly, this setting becomes nontrivial when relational communication 
tasks are considered. In particular, any problem with two-sided input can be transformed 
into a single- input relational problem of equal classical one-way cost. We show that the 
situation is different in the quantum case, where the same transformation can make the 
communication complexity exponentially larger. This happens if and only if the original 
problem has exponential gap between quantum and classical one-way communication 
costs. We believe that these auxiliary results might be of independent interest. 

1 Introduction 

In this paper we compare quantum and classical modes of computational learning and give 
the first unconditional exponential separation between the two. 

Let X be a (finite) domain and y be a set of possible labels. Let C be a concept class 
consisting of functions i : X ^ Y, each £ G C can be viewed as assignment of a label to every 
X € X. The knowledge of X, Y and C is shared between a teacher and a learner; the teacher 
also knows some target concept £o G C, unknown to the learner. The learning process consists 
of two stages: the learning phase, followed by the testing phase. In the learning phase, the 
teacher and the learner communicate in order to let the latter learn £o- In the testing phase, 
the learner has to demonstrate that he has successfully learned Iq: for example, an arbitrary 
X £ X may be given to him, and he would have to respond with £o{x). 

A learning model specifies the set of rules governing the learning and the testing phases. 
The teacher is, in general, viewed as an adversary that obeys the model's restrictions. 



One of the most natural and widely used learning models is that of Probably Approximately 
Correct (PAC ), defined by Valiant |V84j . In the learning phase of PAC a sequence of training 
examples 

(xi,4(xi)), . . . , {Xk,io{Xk)) 

is sent by the teacher to the learner. The examples are independently chosen according to 
some distribution D over the domain X0 In the testing phase the learner is given a random 
X ~ D and has to respond with io{x). 

Two error parameters are present in the definition of PA C : accuracy l — e and confidence 
1 — 5. We say that learning was successful if in the testing phase the learner correctly labels 
a randomly chosen x ~ D with probability at least l — e. A learning algorithm must be 
successful with probability at least 1 — 5, taken over both algorithm's randomness and the 
set of examples received during the learning phase. 

We say that a concept class C is efficiently learnable in PAC if there exists an algorithm 
that runs in time at most polylogarithmic in the domain size and polynomial in 1/e and 1/6, 
and learns any £ £ C. Note that the running time of an algorithm is, trivially, an upper 
bound on the number of training examples that it uses during the learning phase. 

1.1 Previous work 

In |BJ95j Bshouty and Jackson introduced a natural quantum analogue of PAC, which we 
denote here by QAC. They gave an efficient algorithm that learns DNF formulas w.r.t. the 
uniform distribution from quantum examples - this is currently not known to be possible 
from classical examples (even with a quantum learning algorithm). 

The question of whether quantum learning models are more efficient than the classical 
ones has been considered by Servedio and Gortler |SG04j . who showed that the models PAC 
and QAC are equivalent from the information-theoretic point of view. On the other hand, 
they showed that quantum models are computationally more powerful than their classical 
analogues if certain cryptographic assumptions hold. 

1.2 Our results 

In the definition of a new learning model PQ (Predictive Quantum) we will generalize QAC 
in several ways. 

First, we allow relational concept classes. Namely, the elements i of C can be arbitrary 
subsets oi X X Y, thus allowing multiple correct labellings for every x £ X. During the 
learning phase the learner receives pairs {xi,yi), such that Xi ^ D and yi is a uniformly 
random element of {y|(xi,y) € ^o}- At the testing phase any y satisfying {x,y) G £q is 
accepted as a correct answer to the query x. 

Second, we classify all learning models as follows: 

• We call standard a learning model where in the testing phase the learner outputs a final 
hypothesis, viewed as a function h : X . In the testing phase it is checked whether 
h{x) agrees well with the target concept. The final hypothesis should be efficiently 

^Several variations of PAC are studied in the literature, in particular there is a definition that allows 
"distribution-specific" learning algorithms. In this paper we will always fix D to be the uniform distribution 
over X. 
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evaluatable (under the same notion of efficiency that apphes to the learning algorithms 
in the model). 

• We say that a model is quasi-predictive if the learner has to answer queries in the testing 
phase. The number of testing queries that will be asked is unknown during the learning 
phase. 

• We call a model predictive if the learner should answer a single query in the testing 
phase H 

For example, the PAC model, as defined above, is predictive. If we would allow an 
arbitrary number of testing queries, that would make it quasi-predictive. If we require that 
in the end of the learning phase the learner produces a hypothesis h : X ^ Y, such that 
Pr^r-^D [h{x) = i{x)] > I — e, that turns the model into standard. 

As long as the learning phase remains unchanged, standard learnability of a concept im- 
plies its quasi-predictive learnability, which, in turn, implies predictive learnability. On the 
other hand, it is well known that in any "reasonable" classical learning model, a predictive 
learning algorithm can be turned into a standard one (this can be achieved by producing a 
final hypothesis consisting of a description of the answering subroutine, all the data avail- 
able after the learning phase, and a random string, if randomness is used by the answering 
subroutine). Therefore, in the classical case the standard, the quasi-predictive, and the pre- 
dictive modes of learning are essentially equivalent; in particular, the above three definitions 
of PAC give rise to the same family of efficiently learnable concept classes. We will see that 
the situation is different with quantum learning. 

For the rest of the paper let n '= [log . Consider the following definition. 

Definition 1. Let D be a distribution over X. We say that a hypothesis h : X ^ Y 
approximates a concept i £ C w.r.t. D if 

• PVxr^D [h{x) = i{x)] > 2/3, wiien i is a function; 

• Fr^r^D [{x, h{x)) € ^] > 2/3, when (. is a relation. 

If a hypotheses class % contains at least one function that approximates every i G C then we 
say that H approximates C. 

Any standard algorithm that learns C with e < 1/3 must use a class of final hypotheses 
that approximates C. An efficient algorithm can use a class of final hypotheses of size at most 
exponential in poly(n). As outlined above, efficient learnability in any classical model implies 
efficient learnability in the corresponding standard model, and therefore C is efficiently learn- 
able in some classical model only if there exists H. of size at most 2P°^y(") that approximates 
C. 

We call a concept class C unspeakable if any class % that approximates it should be of size 
at least 2^"*"\ In particular, neither a classical algorithm nor a standard quantum algorithm 
can efficiently learn an unspeakable concept class. 

^Note that a concept class that is efficiently learnable by our definition of predictive learning is also effi- 
ciently learnable in a version where polynomial number of testing queries are made. For notational convenience 
we will use the single-query definition of predictive learnability. 
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In this paper we demonstrate an efficient quantum predictive algorithm that learns an 
unspeakable relational concept class. Therefore, quantum predictive learnability does not im- 
ply quantum standard learnability. On the other hand, we will show that no quasi-predictive 
quantum algorithm can efficiently learn an unspeakable concept class. We also show that effi- 
cient quantum learning of a functional unspeakable concept class is impossible, and therefore 
the combination of relational concepts and quantum predictive mode of learning is essential 
for learning an unspeakable class. 

Following is a summary of our main results (cf. Theorem l3.1|, Lemma [4.41 and Lemma [4.5p . 

Theorem 1.1. There exists a relational concept class that is unspeakable but can be efficiently 
learned in the model of predictive quantum PAC . 

A concept class C that witnesses the above theorem is given in Definition HI Its con- 
struction has been inspired by a communication problem due to Bar-Yossef, Jayram and 
Kerenidis (BJK04]. 

Theorem 1.2. Classical learning of an unspeakable concept class is not possible from less 
than exponential amount of information from the teacher, even by a computationally unlimited 
learner. 

Both standard and quasi-predictive learning of an unspeakable concept class is not possible 
from less than exponential amount of quantum (w.l.g.) information from the teacher, even 
by a computationally unlimited learner. 

Predictive learning of an unspeakable functional concept class is not possible from less 
than exponential amount of quantum (w.l.g.) information from the teacher, even by a com- 
putationally unlimited learner. 

Two parts of Theorem 11.21 are proved by making connection to two "impossibility of 
separation" results in communication complexity. One of them is due to Aaronson |A04| . 
and the other is new and might be of independent interest. 

We will consider a special case of one-way communication, which will we call single-input 
mode, where Bob receives no input. We show that, somewhat surprisingly, for any single- 
input communication task the quantum and the classical one-way costs are asymptotically the 
same (the statement is trivial for functional tasks, but the relational case is more involved). 
More details can be found in Section 14.21 

2 Definitions and more 

For a G N we denote [a] '= {1, . . . ,a}. We view the elements of Za as integers {0, 1, , . . . , a — 1}, 
and accordingly we define their ordering 0<l<---<a — 1. For any f G N and b £ Za, let 
i ■ b = ib he the i'th power of b w.r.t. the group operation +. 

A good survey of quantum vs. classical learning can be found in |SG04j . We will usually 
ignore normalization factors and global phases of quantum states. We define a predictive 
quantum version of PAC, as follows. 

Definition 2. In the PQ (Predictive Quantum) learning model, a learning algorithm can 
ask for arbitrarily many copies of the state 
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where £q is a relational target concept. In the end of the learning process the algorithm 
receives an element x £ X and should, with probability at least 5/6, output any y satisfying 
{x,y) e 4. 

A learning algorithm is ethcient if its running time is at most polynomial in n =^ [log |X|] . 
A concept class C is efEciently learnable in PQ if there exists an efficient algorithm that PQ- 
learns every i £ C. 

In the above definition the relative amplitudes of the pairs \x,y) in a training example 
are chosen such that a projective measurement in the computational basis would result in a 
uniformly chosen x, and given x, all elements of {y'\{x,y') G £0} are equally likely to come 
with it. Therefore, the model can be viewed as a natural quantum generalization of the 
relational version of PAC, as discussed in the Introduction. 

The fact that all quantum training examples are the same lets us get read of the confidence 
parameter (5) in the definition of PQ (there is no such thing as "unlucky" sample of training 
examples). For simplicity, we choose the required accuracy (e) to always be 5/6. Note also 
that in the testing phase we want the learning algorithm to give a correct answer to any 
X £ X with good probability (instead of just being able to cope with a randomly chosen x). 
This further simplifies the definition and also makes our result stronger (as we construct a 
PQ-algorithm, and do not state any lower bound against this model). 

Let D be the uniform distribution over X, recall Definition [TJ 

Definition 3. Let C be a concept class. We say that C is unspeakable if \C'\ £ 2^"^"'' holds 
for any C that approximates C w.r.t. D. 

3 Concept class C 

We define a concept class C that will be shown to be both unspeakable and efficiently 
PQ-learnable. Our definition has been inspired by a communication problem considered 
in |B.TKn4| . 

Definition 4. Let N be prime. Every concept in the class C is represented by C £ {0, 1}^. 
The set of queries is [N — 1], represented by binary strings of length n = [logA^]. A pair 
(x, b) £ 'Lfq X {0, 1} is a valid answer to query j w.r.t. C £ C if Cx ® Cx+j = b. 

We slightly abuse the notation by viewing each C £ C either as a binary string of length 
or as a set {(j, x, b)\{x, b) is a valid answer to j w.r.t. C}. 

Theorem 3.1. The concept class C is unspeakable. On the other hand, C is efficiently 
learnable in PQ. 

The two parts of the theorem will be proved in Sections 13.11 and 13 . 2^ respectively. The key 
observation that we use to efficiently learn C is the following (originating from |KW04j ) . Let 
a binary string x £ {0, 1}" be represented as a quantum state \a{x)) = X^(— 1)^' where 
i ranges in [n]. Even though it is impossible to recover individual bits of x by measuring 
\a(x)), there is something nontrivial about x that can be learned from \a{x)). Namely, given 
any perfect matching M over [n], it is possible to measure |a(a;)) in such a way that for some 
£ M the value of Xi © Xj would become known after the measurement. The quantum 
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state fits in [logn] qubits; on the other hand, it can be shown that the amount of 

classical information needed to allow similar type of access to x is 'np^^\ and this is used to 
show that C is unspeakable. 

3.1 Efficient P<5-learning of C 

Our learner will need k PQ-examples in order to answer to the testing query with probability 
1 — 1/2*^, and whenever an answer is given it is correct H Fix C E C, then the training 
examples are of the form 

The learner measures the last register of each of the k instances of \oP^ in the basis {|0) + 
|1),|0) - |1)}. With probability 1-1/2^ at least one measurement results in |0) — |1), 
then the learner keeps that copy and abandons the rest (otherwise he gives up). Next, the 
learner measures the second register in the computational basis, thus obtaining in the first 
two registers 

{j,xo,i)eC j&[N-i] je[N~i] 

for some xq G '^n- Then he performs the transformation |j, xq) — >• \ j + xo,xo), and the state 
of the first register becomes 

Ko)= E (-if--i-o+^->= E i-^f'\k)- 

jelN-l] k£ZN\{xo} 

At this point the learner is ready for the testing phase. Assume that a question q £ [A^ — 1] 
has been asked. Define the following perfect matching over Z^v \ {xq}: 

mq = I (xo + (2i + l)q, xo + {2i + 2)q) 

Pairwise disjointness of the edges and the fact that xq is isolated follow from primality of 
A^. The learner performs projective measurement of onto (A^ — l)/2 subspaces, each 

spanned by a pair of vectors \a) and \b) where a and b are connected in niq (to make the 
measurement complete we add jxoX^^ol to it, but this outcome never occurs). 

Assume that the outcome of the last measurement corresponds to the edge (a, a+g) G mq. 
Then the state of the register that contained becomes either \a) + \a + q) or \a) — \a + q), 
the former corresponding to Ca © Ca+q = and the latter to Ca ® Ca+q = 1. As the two 
states are orthogonal, the learner is able to distinguish and, respectively, answer (a, 0) in the 
first case and (a, 1) in the second, and that is a correct answer. 

All quantum operations involved in the algorithm can be performed efficiently. 

^If we allow a slightly modified form of training examples, where i is represented through the amplitude as 
X i)£c(~^y \3' then it is possible to PQ-learn C exactly from one such example. 
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3.2 C is unspeakable 

Let us see that the concept class C is unspeakable. The following proof uses some ideas from 
|BJK04j and |GKRW06j . 

Assume that C is approximated by a class P. Then there exists some ho £ D that 
simultaneously approximates at least 2^/ I'D] elements of C, denote the set of those elements 
by Co- 

Consider the answers that ho gives to all possible queries q £ [A^ — 1]. Denote {xg,iq) = 
/io((?) and let 

Qo '= {Q\{xg,iq) is a good answer to q w.r.t. at least 3/5'th of Cq's elements} . 

Counting reveals that IQol > 

Let Cq =^ {xq,Xq + q) and -Eo =^ {e<j|g G Qo}- Every edge Cq corresponds to at most 2 
different values of g G [A^ — 1], therefore |£'o| > ^j^- Consider a graph Go over A^ nodes, 

whose edges are the elements of Eq. Observe that Go contains at least y^2\Eo\ > 
non- isolated vertices. 

Let Fo C Go be a forest consisting of a spanning tree for each connected component of 
Go- Then Fo contains at least \J^^ edges, denote them by E'q. Let Q'q C Qo be a subset of 
size \Eq\, such that 



E'^ = {eq\q£Q'^]. 

View the elements of C as binary strings of length A^. Let us consider two probability 
distributions, one corresponding to uniformly choosing C £ C and the other corresponding 
to uniformly choosing C G Cq - denote them by and Dq, respectively. Then 



log ( = H [D^] - H [D^o] , 



where H [•] denotes the binary entropy. 

For every Sq = (a, b) put Iq =^ Ca © Cb, and let J =^ {Iq)q^Q'^. It is straightforward from 
the construction of Qg that if G ~ D'-' then the collection {lq\q G Q'o] consists of mutually 
independent unbiased Boolean random variables, and therefore H/jc [J] = |Qo|. 

As H [G] = H [J] + H [G| J] holds w.r.t. any distribution of G, 

log f = H [G] - H [G] = H [J] - H [J] + H [C\j] - H [C\j] 

DC DC DC Dc' ' ' dc' 

> H[J]-H[J] = |Q'o|-H[J] 

DC DC DC (1) 



>|Qo|-Eh[/.]=e(i-H[/.]) 



where the first inequality follows from the fact that Jlj^c [G|J] = N — \Qq\, which is the 
maximum of H [C\j] under any distribution of G. 
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Prom the definition of Qq (and the fact that Qq ^ Qq), we know that each of G Qg} 
is at least 3/5-biased, therefore H^jC [Iq] < |g, and ([T]) leads to 



\Co\J ~ 50 50 250 ' 
for sufficiently large N. According to our choice of /iq, 

|Co| 

which means that the class C is unspeakable. 



4 Optimality of our separation 

The model of PQ where we demonstrated learnability of C is computationally feasible. But 
in the definition of PQ we have modified what is probably the most usual learning setting in 
several ways: Besides being quantum, our algorithm is predictive; moreover, the concept class 
that we learn is a relational one. In this section we will see that all these "enhancements" 
are essential in order to be able to learn an unspeakable class efficiently. 

We already know that classical learning of an unspeakable class cannot be efficient. We 
will show that exponential amount of training data is required in order to learn a functional 
unspeakable concept (Lemma 14. 4p . as well as to learn any unspeakable concept in quasi- 
predictive setting (Lemma l4.5p . The both results are established through making a connection 
to one-way communication complexity: Our proof of Lemma l4.4l is based on Aaronson's |A04| . 
and in order to prove Lemma 14.51 we establish a new fact about one-way communication 
complexity that might be of independent interest (Theorem 14.21 Corollary 14. 3p . 



4.1 Quantum and classical one-way communication complexity 

The one-way model of communication complexity is defined as follows. Let P Q X x Y x Z 
be a (relational) two-party communication problem. Input to P has the form {x, y) £ X xY, 
in the beginning it is split between two players: Alice receives x and Bob receives y. The 
goal is for Bob to produce z £ Z, such that {x, y, z) G P. The players cooperate to achieve 
it, namely Alice sends a message m to Bob, and he outputs z £ Z based on the message m 
and his portion of input y. 

Assume for convenience that both the length of y and the length of m are functions of 
the lengths of x, and denote the latter by n = [log |X|] . Both Alice and Bob are all-powerful 
computationally, and their goal is to solve the problem using as short m as possible. There 
are two versions of this model that we are interested in, namely quantum and classical. In 
the former the action of the players should obey the laws of quantum mechanics, in particular 
the message m is quantum and its "length" is measured in qubits; in the latter the message 
is classical and consists of bits. We let our protocols employ mixed strategies, i.e., shared 
randomness is allowed. 

For any e we say that a protocol T solves P with error e if Alice and Bob, who behave 
according to T, produce a correct answer to every input {x,y) £ X xY with probability at 
least 1 — £. For a distribution /i over X x y we say that T solves P with error e w.r.t. n 
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if a correct answer is produced with probability at least 1 — e when {x,y) ~ fi. The e-error 
communication cost of P is the smallest possible message length of a protocol that solves P 
with error e, and e-error communication cost w.r.t. ji is defined similarly. We say that the 
hounded- error cost of P is at most k if for any e € ^2 (1) its e-error cost is at most k. 

Denote by TZl {P) (7^^ ^(P)) the classical one-way e-error communication cost of P (w.r.t. 
n), and by TZ^ {P) its bounded-error classical cost. Denote by Q^(P), Q^^eiP) and Q\P) 
the corresponding quantum analogs. 

An important special case of relational communication problems are functional problems 
(partial or total). The following theorem follows readily from Theorem 6 of |A04j : 

Theorem 4.1. [A04| For any functional two-party communication problem F : X xY ^ Z , 
it holds that 7^^(F) G O (log(|y|)Q^(F) log Q^(F)) for any e < 1/2-0(1). 

4.2 One-way communication when Bob receives no input 

In this section we present a new result in communication complexity, it will be used later to 
prove Lemma 14.51 

Consider a special case of one-way communication that we call single-input mode, where 

Bob receives no input. Denote '= {0}, and let P C X x x Z be a communication task 
where Alice receives x and sends a single message m to Bob, who has to output z G Z based 
on the message m alone. 

This setting is not as trivial as it may appear at first glance0 For instance, any com- 
munication problem with two-sided input P C X x Y x Z has a single-input analogue 
P' Q X X X Z^ , where Bob has to produce a list of answers to the original P w.r.t. 
all y gY. Namely, let 



Pr [{x,y,zy) eP]>e 



where is a distribution on X xY and fi^ is the marginal distribution of B when (A, P) ~ /i 
and A = X. Note that for any /i and e G ^2 (1), TZ^ {P'^^e) — (^)' ™d on the other hand, 
by the Minimax theorem TZ^ (P) = sup |7^^(i-'^^^)}, where the supremum is taken w.r.t. all 
possible fi and e G O (1). 

In other words, P'^ ,, is essentially as difficult to solve in the model of one-way classical 
communication as -P is. Somewhat surprisingly, the same is not true in the case of quantum 
communication. More generally, below we show that for any single-input communication 
task the quantum and the classical one-way costs are asymptotically the same. In particular, 
this means that (P) can be exponentially smaller than {P'^^e) some e G (1) ~ this 
happens if and only if the gap between (P) and TZ^ (P) is exponential (examples of such 
P were given in |B.TKn4] . [(IKKRWOTj ). 

Theorem 4.2. For any relational two-party communication problem P C X x x Z , any 

distribution fi over x G X and any (1) < e < 1 — (1), it holds that TZj^^^{P) G O [Qlt^eiP]) ■ 

Corollary 4.3. For any P X x Q x Z, it holds that TZ^ (P) G O {Q^{P)) ■ 



*It is important that we consider relational problems, for functions the single-input mode is indeed unin- 
teresting. 
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Proof. By the Minimax theorem, for every e there exists fi such that TZ^ (P) = TZi^^{P). 



If P is a function then Corollary 14.31 is a very trivial special case of Theorem 14.11 On 
the other hand, Corollarv 14.31 applies to the much more general case of relational problems, 
where a statement analogous to Theorem 14. II provablv does not hold. 



Proof of Theorem \4.S\ Let W be a valid Q^^^-protocol of cost m for P, i.e., W guarantees 
error at most e w.r.t. x ~ We want to build an 7?.^ ^-protocol of cost O (m). 

Let A and B be random variables taking the value of Alice's input x £ X and Bob's 
answer z £ Z, respectively. Assume j4 ~ /i and let /i^ be the corresponding distribution of 
B. Conditional upon ^4 = x let i? ~ /u^. Define a random variable B' as a "refined version" 
of B, namely: if ^ = x then the conditional distribution of B' is 



B' 



dcf 




if (x,0,z) G P 
otherwise 



where 1 — e^; is the probability that W returns a correct answer on input x. 
By the Holevo bound and the information processing principle, 



m>l[A:B] 



A=x 



For every x G X, 



dKL ( ^J'x' 



B 



/^f (^) 



< 



1 



< 



1 



^,B{z) 



1 



1 



dKL + Y 



By linearity of expectation 

dKL ( ^J': 



E 

A=x 



< 



m 



log' 



1 - 
log' 

- < - 



2m 



(2) 



for sufficiently large m. 



We claim that there exists an '7^^^^ 



llm 



e{l-e) 



By the definition of 



-protocol for P of cost 

B', any z in the support of /U^' is a correct answer to x G X. The key observation is that fi^' 
is not too far from jj,^ , by Therefore, if Alice and Bob sample sufficiently many elements 
from fi^, with high probability at least one of them would belong to the support of /i^ . Such 
sampling can be performed by the players locally, using shared randomness. Then Alice can 
send a pointer to an element which is a good answer w.r.t. her x. 

Let us estimate the probability that a randomly chosen z ~ fi^ satisfies //^ (z) > 0. Let 



X' 



^^'l^xexldKL (/.f 



B 



< 



5m 



e{l-e) 
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then it follows from ([2]) that /i(X') > 1 - e/2. Fix any xq G X' and let Z' =^ |z > o|. 

From 

5m 



/^xo (^) log 

zGZ' 



d-KL /i. 



it follows that 



Pr 



< 2 



1 

> -. 

~ 2 



Let Z" =^ {z G Z'L^(z) > /if^'(z) • 2"^}, then ^^(Z") > 2"^ 
We have that for any xq G X', 



Pr [(xo,0,z) G P] = /.^(Z') > ^^(Z") > 2-^- 



If we sample M 



def 



2^IT^ 



elements from /i^ then with probability greater than 1 — e/3 at 

least one of them is a correct answer w.r.t. to the given xq, whenever xq G X' . As the latter 
happens with probability at least 1 — e/2, the unconditional probability that one of the M 
elements is a correct answer is greater than 1 — e. A pointer to one of M elements requires 



llm 



bits, and that is the cost of our 7^^ ^-protocol for P, as required. 



'TheoremU.2\ 



4.3 Connection to learnability of unspeakable concepts 

Let us see how Theorem 14.11 and Corollary 14.31 imply that our construction in Theorem 11.11 is 
tight. First, let us see that no unspeakable functional concept class can be efficiently learned 
even in a quantum predictive learning model. 

Lemma 4.4. Predictive learning of an unspeakable functional concept class is not possible 
from less than exponential amount of quantum (w.l.g.) information from the teacher, even 
by a computationally unlimited learner. 

Proof. Assume that for some functional concept class T that is unspeakable, the following 
holds. A teacher T knows some /o G hidden from a learner 5. Then T exchanges at most 
kq qubits with S. Finally, S is given some xq from the domain X of the functions in and 
is able to compute fo{xo) with confidence at least 5/6. 

Consider the following two-party communication task Q. Alice receives /o G J-', Bob 
receives xq £ X and they have to output /o(a;o)- Clearly, Q^^g(^) < kg. 

Let kc = {Q). As F is unspeakable, k^ G 2^("). By TheoremEl k^ € O {n ■ kg log{kq)), 
and so kq G 2^("'\ as required. ■ 

Now we show that unspeakable concepts cannot be efficiently learned in the quasi- 
predictive (or standard) setting: 

Lemma 4.5. Both standard and quasi-predictive learning of an unspeakable concept class 
is not possible from less than exponential amount of quantum (w.l.g.) information from the 
teacher, even by a computationally unlimited learner. 
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Proof. It is enough to prove the statement only for quasi-predictive learning, and the standard 
model can be viewed special case. 

Let C be an unspeakable concept class consisting of relations over X xY, assume that it is 
learnable in the quasi-predictive model by a protocol of cost kg. Then there exists a protocol, 
according to which a teacher T who knows some io £ C exchanges at most kg qubits with a 
learner S who doesn't know io. Nevertheless, afterward S is able to answer with sufficient 
confidence any number of testing questions regarding £q. 

For us it is enough to consider the testing phase where all possible x G X are asked (say, 
in the lexicographic order) and the learner responds with {yx)xex, such that 

V(4, x)£CxX:Pr [(x, y,) G 4] > 5/6, 

where the probability is taken w.r.t. possible runs of the learning protocol for the given £ C. 
Define a relational single-input communication problem Pc C C x x as 




The learning protocol for C that we considered above can be turned into a Q -protocol of 
cost kg for Pq that is correct with probability 1 — o(l) w.r.t. every Iq G C, in particular 
QHPc) < kg. By Corollary TZ' (Pc) G O (kg). 

Any 7^^ -protocol of cost kc for Pc readily leads to an approximating class for C of size 
2''". As C is unspeakable, kc G 2^("\ where n = log|X|. Therefore, kg G 2^^"'\ as required. 

■ 

For simplicity, in the two proofs above we assumed distribution-free mode of learning, 
where the learner in the testing phase had to give correct answer to any x £ X with high 
probability. Distributional versions of Lemmas 14.41 and 14.51 can be proved similarly. 
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